Seqdbscan : a New Sequence Dbscan Algorithm for Clustering of Web Usage Data
نویسنده
چکیده
Web is a vast area for data mining research. It is used in finding the user access patterns from web access log. User page visits are sequential in nature In this paper,I proposed a new clustering algorithm, SeqDBSCAN for clustering sequential data.. we adopted a similarity preserving function called sequence and set similarity measure SM that captures both the order of occurrence of page visits as well as the content of pages we conducted experiments comparing the results of SeqDBSCAN with other similarity measures S3M, Euclidean and Jaccards. The clusters resulting from these measures are computed using a cluster validation technique called Average levensthein distance(ALD).. Based on these results, We tested the new algorithm on dataset namely, MSNBC dataset and proved that the inter cluster similarity is high in SM when compared to the Euclidean and Jaccards distance measures and a set of experiments are conducted to investigate whether clustering performance is affected by different sequence representations, and different distance measures and other factors like number of web pages, similarity between clusters, number of user sessions , number of clusters to form.
منابع مشابه
A density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کاملImprovement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملبررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملDiscovery of Web Usage Profiles Using Various Clustering Techniques
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely...
متن کاملQuantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions
Clustering techniques are widely used in “Web Usage Mining” to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are...
متن کامل